Primary Biodiversity Data

Observations of the occurrence of a species are the fundamental unit of biodiversity data. We will explore in this unit where to look for open-access occurrence data, how to access those sources from R, and tools for visualizing point distributions of species.

Library ‘spocc’

A great tool from the rOpenSci consortium (a group of developers building R capacity for open science).

Package details on GitHub

Tutorial here

We should all have spocc installed, but if not try:

install.packages('spocc')

With spocc installed we can try a simple query of the GBIF database that we have seen briefly before.

library(spocc)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
## Warning in fun(libname, pkgname): rgeos: versions of GEOS runtime 3.7.0-CAPI-1.11.0
## and GEOS at installation 3.6.2-CAPI-1.10.2differ
ulmus <- occ(query='Ulmus americana', from='gbif')
## Registered S3 method overwritten by 'crul':
##   method                 from
##   as.character.form_file httr

The data are returned as an “S3 class” object. Somewhere in there is a tidyverse tibble (like a table but not).

print(ulmus) ## Not obvious what or where the data are
View(ulmus)

Maybe it’s still not obvious how we get in. To view an element of the data returned we use the “$” operator and call each by name. In general it’s easier to convert these to regular R data frame objects since not everything we want to do with these data is compatible with the tidyverse/spocc formatting.

df = as.data.frame(occ2df(ulmus$gbif))

#Also try:
#head(df)
#colnames(df) #!! That's a lot of columns!!

mapr: Leaflet mapping of species distribution data.

To create interactive graphics showing species occurrence locations and some metadata we can use ‘mapr’. This library uses a JavaScript library known as leaflet and Open Street Maps services (and others!) to create interactive maps that you can navigate through and click on points to pop-up metadata about each occurrence.

If not already done:

install.packages('mapr')

Then call map_leaflet() either on the spocc object:

library(mapr)
map_leaflet(ulmus)

OR with the data.frame:

map_leaflet(df)

‘mapr’ shows the data for the first few columns in each pop-up tab. We can control what is shown there by only passing some columns to map_leaflet().

map_leaflet(df[,c('name', 'longitude', 'latitude', 'stateProvince', 'country', 'year', 'occurrenceID')])

Specifying columns makes it much easier to sift through large amounts of data to check sources and look for patterns of bias.

NOTE: mapr only works with data formatted by spocc and related libraries.

More with spocc queries.

Do you notice something odd when you run:

nrow(df)
## [1] 500

Check how many records are returned for the same search on the GBIF website

Our query only returned the first 500 records because that is the default for the occ() function.

We can fix that:

ulmus2 <- occ(query='Ulmus americana', limit=2500)
map_leaflet(ulmus2)

Meeting up with WorldClim

R gives us the tools to plot spatial point data like our occurrence records over a geospatial raster object like the WorldClim data. R also gives us tools to access the raster data for our occurrence points.

Load WorldClim again: (If you are in the same folder then you will not have to download the data again so this should be quick).

library(raster)
## Loading required package: sp
wc = getData('worldclim', var='bio', res = 5)

Plot raster with points over:

ext = extent(-125, -55, 20, 60)
wc2 = crop(wc, ext)
plot(wc2[[12]], col = topo.colors(99))
points(df$longitude, df$latitude)

Extracting climate data

The ‘raster’ library has a function for referencing the climate data stored in our raster for every occurrence point. These data are the primary input to our species distribution models.

extr = extract(wc2, c(df$longitude, df$latitude))
summary(extr)
##       bio1             bio2            bio3          bio4      
##  Min.   :-36.00   Min.   :116.0   Min.   :22    Min.   :14212  
##  1st Qu.:-32.00   1st Qu.:117.0   1st Qu.:22    1st Qu.:14316  
##  Median :-27.00   Median :117.0   Median :22    Median :14339  
##  Mean   :-29.06   Mean   :117.2   Mean   :22    Mean   :14326  
##  3rd Qu.:-27.00   3rd Qu.:118.0   3rd Qu.:22    3rd Qu.:14360  
##  Max.   :-25.00   Max.   :118.0   Max.   :22    Max.   :14392  
##  NA's   :501      NA's   :501     NA's   :501   NA's   :501    
##       bio5            bio6             bio7          bio8      
##  Min.   :218.0   Min.   :-300.0   Min.   :505   Min.   :136.0  
##  1st Qu.:223.0   1st Qu.:-295.0   1st Qu.:512   1st Qu.:141.0  
##  Median :224.0   Median :-287.0   Median :513   Median :144.0  
##  Mean   :224.1   Mean   :-289.9   Mean   :514   Mean   :143.1  
##  3rd Qu.:226.0   3rd Qu.:-286.0   3rd Qu.:518   3rd Qu.:145.0  
##  Max.   :227.0   Max.   :-279.0   Max.   :519   Max.   :146.0  
##  NA's   :501     NA's   :501      NA's   :501   NA's   :501    
##       bio9            bio10           bio11            bio12      
##  Min.   :-225.0   Min.   :136.0   Min.   :-225.0   Min.   :405.0  
##  1st Qu.:-222.0   1st Qu.:141.0   1st Qu.:-222.0   1st Qu.:411.0  
##  Median :-217.0   Median :144.0   Median :-217.0   Median :426.0  
##  Mean   :-218.7   Mean   :143.1   Mean   :-218.7   Mean   :420.8  
##  3rd Qu.:-216.0   3rd Qu.:145.0   3rd Qu.:-216.0   3rd Qu.:426.0  
##  Max.   :-213.0   Max.   :146.0   Max.   :-213.0   Max.   :433.0  
##  NA's   :501      NA's   :501     NA's   :501      NA's   :501    
##      bio13           bio14           bio15          bio16      
##  Min.   :71.00   Min.   :16.00   Min.   :50.0   Min.   :183.0  
##  1st Qu.:73.00   1st Qu.:16.00   1st Qu.:51.0   1st Qu.:188.0  
##  Median :77.00   Median :16.00   Median :53.0   Median :197.0  
##  Mean   :75.46   Mean   :16.62   Mean   :52.1   Mean   :193.7  
##  3rd Qu.:77.00   3rd Qu.:17.00   3rd Qu.:53.0   3rd Qu.:197.0  
##  Max.   :79.00   Max.   :18.00   Max.   :54.0   Max.   :201.0  
##  NA's   :501     NA's   :501     NA's   :501    NA's   :501    
##      bio17           bio18           bio19      
##  Min.   :56.00   Min.   :183.0   Min.   :56.00  
##  1st Qu.:56.00   1st Qu.:188.0   1st Qu.:56.00  
##  Median :56.00   Median :197.0   Median :56.00  
##  Mean   :56.62   Mean   :193.7   Mean   :56.62  
##  3rd Qu.:57.00   3rd Qu.:197.0   3rd Qu.:57.00  
##  Max.   :58.00   Max.   :201.0   Max.   :58.00  
##  NA's   :501     NA's   :501     NA's   :501
boxplot(extr[,'bio12'], main='Distribution of Mean Annual Precipitation for occurrence of Ulmus americana')

Challenge:

Query iNaturalist for Ulmus americana records. (Hint: you may want to look at the rinat library and the get_inat_obs() function).

Assignments:

Reading:

Check out the ENMeval paper before moving on to the next section.

Assignment (PART 1):

Using spocc and mapr. Create a leaflet map of a species of interest to you. Make sure you get at least 20 unique locations where an occurrence is recorded. Record all of your commands in an Rscript. Save your leaflet map as a webpage. Post the script and your map to GitHub (you may want to create a repository for this course on your account). When that is up post a link to your repository on Slack using the biodiversity channel.

NEXT >>